feat: implement --served-model-name CLI option#158
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple served model names in the API, allowing the server to respond to several identifiers while designating the first as the primary ID for responses. Changes include adding the --served-model-name CLI argument, updating the server configuration and state to handle a list of names, and modifying request validation logic across HTTP and gRPC routes. Additionally, the /v1/models endpoint now returns all configured names. Feedback was provided regarding the use of debug_assert! for validating the non-empty invariant of served model names, suggesting a more robust check for production builds.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13e0d4e359
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
13e0d4e to
e290867
Compare
|
@BugenZhao PTAL |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e29086775a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Add --served-model-name support, matching vllm's behavior: - Accept zero or more alias names via --served-model-name - GET /v1/models returns one entry per served name - POST completions/chat endpoints accept any served name in the model field and echo back the first (primary) name in responses - gRPC and /inference/v1/generate validate against all served names - Falls back to the backend model path when no names are specified - Removed served_model_name from UnsupportedArgs now that it is fully implemented
e290867 to
26449b1
Compare
Summary
Behavior
When no `--served-model-name` is given, the backend model path is used as the single served name (no change in default behavior).
```
vllm-rs serve Qwen/Qwen3-0.6B --served-model-name qwen3 my-alias
```
Review notes
The `EngineCoreClientConfig::model_name` intentionally keeps the backend model path (`config.model`) rather than the first served alias — it is used for the engine protocol handshake, not for API labeling.